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Abstract 

We investigate the mixing rate of a Markov chain where a combination of long distance edges and 
non-reversibility is introduced: as a first step, we focus here on the following graphs: starting from cycle 
graph, we select random nodes and add all edges connecting them. We prove a square factor improvement 
of the mixing rate compared to the reversible version of the Markov chain. 
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1 Introduction 

We study the mixing properties of certain Markov chains which describe how fast the distribution of the 
state approaches the stationary distribution regardless of the initial conditions. The overall goal is provide 
significant improvement of mixing by minor modifications of the Markov chain. 

Mixing time and rate are fundamental quantities in the study of Markov chains m, m and are the 
object of active research; they are also highly relevant to applications where mixing properties are strongly 
tied with performance metrics. This is for example the case for Markov chain Monte Carlo methods, which 
provide cheap approximation for integrals, and also allow sampling from complex distributions that would 
otherwise be hard to generate directly [7] . Markov Chains also provide a powerful scheme for approximating 
the volumes of high dimensional convex bodies m^m- A different application, average consensus, involves 
the distributed computation of the average of initial values at different agents in a multi-agent system (values 
which might correspond to measurements, opinions, etc. ) m- This can be achieved using a Markov chain 
for which the stationary distribution is uniform. The initial values can be viewed as a probability distribution 
scaled by a constant, and the Markov chain will approach the uniform distribution multiplied by the same 
constant, therefore the average of the initial values to be present at each node. Efficient consensus and 
average consensus approaches actually also play a role in several recent distributed optimization algorithms 
m, m- For all these applications good performance is crucial and it is determined by the dynamics of the 
underlying Markov chain. Mixing properties are formulated exactly to answer such questions, which are the 
topic of the current paper. 

There are several ways of obtaining good mixing performances. In many applications, the graph of 
possible transition is determined by the problem definition, but the specific transition probabilities can be 
chosen. When these are required to satisfy some strong symmetry properties (reversibility, described later 
in detail) choosing those optimizing the mixing rate can be formulated as an SDP problem [2], [3], which 
can be solved numerically using standard methods. Departing from these symmetry properties brings strong 
technical challenge, at the same time it can actually lead to significant improvement; the mixing time can 
indeed in some case drop to its square root 01: m- On the other hand, when it is possible to modify the 
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graph of possible transitions, astonishing speedup can also be obtained by adding even a small number of 
randomly selected edges n. m, 0- 

Our goal in this paper is to study the speedup that can be achieved by adding a small number of random 
edges and making the chain extremely non-reversible. We start with cycle graph of n nodes, select a lower 
number k of them to become hubs, then add extra edges between the hubs. This scheme is motivated by 
one of the renown models to represent Small World Networks, the Newman-Watts model [5], m- The cycle 
presents a natural way of including asymmetry by introducing a drift meaning increased clockwise transition 
probabilities along the cycle and decreased counter-clockwise ones. At this stage the model needs three 
parameters to be specified: the placement of the hubs, the added interconnection structure on them, and 
the asymmetry introduced along the cycle. 

In this paper we consider the model where hubs are chosen randomly, all edges between hubs are included 
and asymmetry is taken to the extreme: the Markov chain is a pure drift along the cycle taking deterministic 
clockwise steps (except at the hubs). To better understand the dynamics of the process, observe that the 
state of the Markov chain can be described by an arc (as the cycle is split by the hubs) and the position 
within that arc. The main challenge here is to show mixing happens both in term of arcs and in terms of 
positions within. We claim that reaching a mixing rate of n{k/n) up to logn factors is possible and we prove 
it for k = n‘^ where 0 < cr < 1. 

In comparison, if we were to put pure drift along the cycle but with equidistant hubs, we would have 
rapid mixing in term of the arc (a perfect one after leaving the first arc), but no mixing at all in term of the 
position on the arc. Even by decreasing the drift or changing the interconnection structure, the mixing rate 
will remain 0{{k/n)‘^) [^. 

Furthermore, if we were to stay with the classical, symmetrical transitions along the cycle, the mixing 
rate will be again 0{{k/n)^): for an arc at least n/k long even the hitting time of the ends from the middle 
is n((n/fc)^). This holds for any hub placement and interconnection structure. 

After all, we want to emphasize that a speedup with a mixing rate of ^l{k/n) is feasible only now that 
both random hubs and a drift along the cycle are implemented. 

The rest of the paper is organized as follows. In Section we formally describe the random graph model 
and Markov chain on which we focus. In Section 0 we prove the main mixing rate result for a proxy graph 
model. We then translate our result to the primary graph model in Section In Section simulations 
are presented complementing our asymptotic analytical results. We also demonstrate how the mixing rate 
changes when the drift is decreased for the model, suggesting that further performance improvements might 
be possible. We draw conclusions and outline possible future research directions in Section]^ 

2 Graph models, Markov chains and mixing rates 

The concept of the graphs we consider is the following. We start with a cycle with n nodes, and randomly 
select a low number of vertices, n°’ out of the total of n for some 0 < cr < 1, which become hubs. Then we 
connect all hub nodes with each other. Let us now present the precise definitions. 

Definition 1. Given n,k G we define the random graph distribution Bn{k) as follows. Starting from a 
cycle graph on n nodes, we randomly uniformly and independently select k different edges which we remove. 
For the ith remaining arcs, 1 < i < A:, we mark the clockwise endpoint as and the other end as bi. Then 
we add all edges {bi,aj), for all 1 < i^j < k. An example is given in Figure 

We introduce an alternative model which reflects approximately the same concept but is technically more 
convenient due to the added independence. 

Definition 2. Given L G K"*", k G we define the random graph distribution B{L,k). Let us take 
independently the k random variables 

Li ~ Geo{l/L), i = 1,... ,k, 

where Geo{p) denotes a geometric random distribution with parameter p (and 1 as the smallest possible 
value). We begin with a graph that is the disjoint union of k arcs, paths of length Li, each of which has a 
“start point” and an “end point” bi. Then we add all edges {bi,aj), for 1 < i, j < k. 
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Figure 1: Example graph from Bn{k) (see Definitionj^ 


Based on the uniform nature of both constructions we see the following relation between the two models: 

Proposition 3. Given n,k€ consider the random graphs B{n/k, k). Conditioning on the event of 
having exactly n nodes, we get the distribution of B^^k). 

We are interested in the mixing behavior of Markov chains on these graphs. A Markov chain is reversible 
if for any edge {u, v) of the graph the probability of the u ^ v transition is the same as the v ^ u transition. 
In this paper we go further from the comfortable domain of reversible Markov chains, let us now introduce 
the ones we will focus on. 

Definition 4. For any graph coming from Bn{k) or B{L, k) we define the pure drift Markov chain as follows. 
Within any arc we set transition probabilities to 1 along the arc all the way from Oi to bi. From any bi, we 
set transition probabilities to 1/k towards all aj. A part of such a chain is visualized in Figure]^ 



Figure 2: Example arc from the pure drift Markov chain (see Definition 

This is a Markov chain which has a doubly stochastic transition matrix, therefore the stationary dis¬ 
tribution is uniform. We want to quantify the asymptotic rate as a starting distribution approaches the 
stationary distribution. This mixing performance of the Markov chain will be measured by the mixing rate: 

Definition 5. For a Markov chain with transition matrix P we define the mixing rate A as 

A = min {1 — |/i| : /i 1 is an eigenvalue of P} . 

The natural question arises, why do we fix the transition probabilities to be 1 along the arcs? This 
excludes mixing to happen happening when the Markov chain is moving along the arcs. There are multiple 
reasons to do that: First, this demonstrates the power of just the added edges combined with the various 
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arc lengths, as it already provides fast mixing not only among, but also within the arcs! Second, it is more 
convenient to analyze such a pure drift version. 

Nevertheless, it is normal to ask what happens when there is some diffusion introduced along the arcs. 
The magnitude of the mixing rate will not change substantially as it is already almost optimal for the pure 
drift Markov chain (up to logarithmic factors). Still, some improvement does happen, as it will be shown by 
simulations in Section [5l 

We now focus on estimating the mixing rate of the pure drift Markov chain for B{L, k). Afterwards, we 
will transfer these results for the other model Bn{k). 

3 Random polynomials for pure drift Markov chains 

In order to find the mixing rate of a Markov chain, we have to know the eigenvalues of its transition matrix. 
For the current case we transform this eigenvalue problem into finding the roots of a certain polynomial. 

Proposition 6. Let us consider the pure drift Markov chain on a random graph from B{L, k). Define also 

k 

q{z) = Y^Z-^^. ( 1 ) 

i=l 

Then, p G C is an eigenvalue of the transition matrix P iff q{fJ-) = k- 

Proof. Assuming /r is the eigenvalue of the transition matrix P let us find the corresponding eigenvector 
X. Observe that each Oi has incoming edges from exactly the same nodes and the same weights, so the 
eigenvector must take the same value Xa, = ft. at each of them for some ft. 

Along the arcs, for two subsequent nodes p,p'^, from xP = qix we get 

Xp — jJjXp -\-. 

This implies that along the arcs we see the values ft, hfjT^ ,..., . This already completely determines 

X up to scaling and ensures the eigenvalue equation for all nodes except the a^. We get a valid eigenvector 
if the equation also holds for a^, which takes the form 

H ^ = Th, 

i=l 

which is equivalent to q{p-) = k. 

For the other direction, given a n such that q{fj,) = k, we can again build x by setting 1, ..., 

on each arc, and this will clearly be an eigenvector of P with eigenvalue /r. □ 

By Definition the mixing rate is high if the transition matrix P has no eigenvalue near the complex 
unit circle. Therefore to get a lower bound on the mixing rate we have to exclude a ring shaped domain for 
the eigenvalues. The region to be avoided is 

At, = {z:1--^I^<|z|< 1, z^l}, (2) 

where 7 is a parameter to be chosen later. We will show that asymptotically almost surely (a.a.s.) no 
eigenvalue of P falls in R.y. The width of the ring should be viewed as follows. We assume L and k are of 
similar magnitudes meaning that they have a polynomial growth rate w.r.t. each other. Therefore the width 
is at most a logarithmic factor lower than 1/L. Our key result is the following: 

Theorem 7. Assume k,L ^ 00 while pi < log L/ log ft < pu for some constants 0 < pi < pu < 00 , and fix 
7 > 4. Based on the graph model B{L, ft) we use q{z) from 0 and R.y from 0. Then for any c,d > 0 we 
have 

P{3z e R.^, q{z) = ft) = 0{k-M-’^). 

In particular, for the mixing rate we get A > l/(Llog"’' ft) a.a.s. 
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Note that the right hand side can be simplified a bit. The relation between L and k ensures fc'’' < L < 
after a while. Therefore the L term on the right hand side can be replaced by a power of fc, and it is sufficient 
to show that the probability in question is 0{k~‘^) for all c > 0. 

We show the claim in four steps. First we ensure that we can assume the Li variables to be bounded 
with high probability, this will make further estimates possible We then check z coming from different parts 
of Ry'. we start with positive reals, then we treat complex numbers in two different ways depending on their 
arguments. 

Intuitively the reason is the following. For some real 0 < z < 1, q{z) will be too large. Next, take z with 
low arguments, now all z“^* will be in the same half-plane giving a non-zero imaginary part to q{z). When 
the argument is far enough such that q{z) has a chance to have zero imaginary part again, the z~^' will 
point in so many different directions that the cancellations will force the real part below A:, a.a.s. 

Now let us make all this precise. We will confirm that each of the intuitive steps above work with high 
probability, whenever the Li are different enough and are not extremely large. We then join these steps to 
give a proof of the theorem. 

The probabilistic upper bound we need on the Li can be formulated in the following way: 

Lemma 8. For any C > 1, there holds 

P{maxLi > CL log k) = O . 

Proof. Remember that Li are i.i.d. variables with law GeoifVjP). Therefore we have that 

^\CL log k 

LJ 

Knowing (1 — 1/L)^ < 1/e we get 

P{L, > CLlogk) < e-^'°«^ = k-^. 


P{Li > CLlogk) = P{Li > [CL log A;]) = ( 1 - — 


\CL log k] 


< 


To treat all Li together for 1 < i < A:, we us a simple union bound. 

P(maxLi > CL log A:) < . 

i 


□ 


From now on, we will only investigate the (a.a.s.) event that the maximal Li is small as shown in Lemma 
Let us call this event S{C). We now check z coming from different parts of i?.y. The simplest case is 
when z is a positive real: 

Lemma 9. Assume z £ (0,1). Then q{z) ^ k. 

Proof. In this case, q{z) is composed of k positive terms, each of them being larger than 1. Consequently 
the sum is higher than k. □ 

Next we show that there is no z £ with small arguments for which q(z) = k. 

Lemma 10. Assume S{C) and take z € R^ such that 0 |arg(z)| < 7r/(CLlogA:). Then '^q{z) ^ 0. 
Consequently q{z) ^ k. 

Proof. Without loss of generality, assume arg(z) > 0. The event S{C) ensures Li < CL log A:, so z~^' will 
all be in the same half-plane, they will have an argument in (— tt, 0). For all these values, the imaginary part 
is negative, and so is their sum. Therefore it simply cannot be 0. □ 

It remains to check the elements of R.y whose argument is “large”. The arguments in question are those 
in 

^ CLlogA:’^’"” CLlogA:_ ' 

We argue that the angles of z“^‘ become so different that strong cancellations will happen. Our main 
proposition formally stating this is the following: 
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Proposition 11. Choose constants a,/3 > 1 and also pi,pu as in Theorem^ and require pi < log L/ log/c < 
Pu- We use k,L,Li as in Definition^ We define 

m — log“ fc, S = log~^ k. 


Then for k, L large enough we have 


P 


sup cos'^ {Lix) < m — 


I xeA 


i=l 


> 


1 

3’ 


where cos+(y) = max(cos(?/), 0). 

Proof. Broadly speaking we want to have one of the LiX terms to be far from 2fc7r, which should decrease 
enough the sum of the cosines. For a single x, we state the following lemma. 

Lemma 12. Use k, L, Li as in Definition^ Fix /3 > 1, x £ A and choose D C [0, 27r] which is a modulo 27r 
interval with \D\ = 66 = 61og~^ k. Then for k,L large enough we have 

P{{Lix} &D)<1, 

where {a} stands for a mod 27r. 

Proof. Each element of the series {a;}, {2a;}, {3x},... is either in D or not. We can therefore split the series 
into blocks that are in D and to blocks that are not. Let ti be the first coefficient such that {fix} £ D, then 
we can dehne the blocks 


{fix}, {{fi + l)a;},..., {(si - l)a;} € D, 
{s^a;}, {(sj + l)a;}, ..., {{fi+i - l)a;} ^ D. 


We will show the statement 


P{Li £ [ti, Si — 1]) < 3P{Li £ [si, ti+i — 1]), 


(4) 


This is sufficient to complete the proof because by summing them all (and also including the numbers 
1,2,... ,ti - 1) we get 

P({Lia;} £D)< 3Pi{Lix} i D), 
and this immediately confirms the lemma. 

We now compare the number of elements in the blocks above. We want to relate — Si with Si — ti. 
For k, L large enough we have \D\ = 66 < tt. Then we have 


ti C: 


\D\ 


min(a;, 27r — a;) 


1 < 


27r- \D\ 


min(a;, 27r — x) 


+ 1 < ti+i — Si + 1. 


The +1 term is needed to compensate possible boundary effects. The same claim holds for x > ir, it 
corresponds to moving around the angles in the other direction. Using — Si > 1 we also have 


2 ti') fi: U+1 


(5) 


We now show an upper bound on Si — ti by comparing the interval length with the smallest possible step 
size. Remember that the step size x must fall in A dehned in (|^. 


Si ti "fi- 


6(5 


min{min(a:, 27r — x) : x £ A} 


+ 1 = 


log^ k 


■ + 1 < 2CL log^"^ k + 1. 


( 6 ) 


After deducing inequalities (§, (§ on the number of elements in the blocks, we also compare the prob¬ 
abilities of Li falling within them. The probability of Li = j decreases with j. Therefore the probability of 
Li falling into an interval is smaller than twice the probability of falling into the first half if the interval. 
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P{Li G [ti, Si — 1]) < P{Li G [ti, ti + {ti — Si) — 1]) < 2P ( Li G 


Sj — tj 

tijti H--1 


(7) 


As Li is a geometric random variable, shifting the interval of interest by Si — ti introduces only a simple 
multiplicative factor: 

ti — Si 


... = 2 1 - 


1 


P L,G 


ti 

Si, Si H- - -1 


< ... 


We enlarge the target interval from length {si — ti)/2 to U+i — Si relying on Clearly by this the probability 
cannot decrease. 

/ 1 \ ti—Si 

P {Li G [si, ti+i — 1]) . 




For the coefficient at the end of Q we use Q to get 


2 1 - 


<21- 


L 


-2CL k-1 


< 2 [ 1 + A j exp(3C'log^ ^ k) 


for k, L large enough. Here we use that (1 — 1/L)“^^ is approximately exp(2), so it is certainly below exp(3) 
for L large enough. 

Substituting this last bound together into 0 leads to Q This is enough to complete the proof as we 
have seen before. □ 


To come back to the proof of Proposition 11 let us choose D = [—3i5, 3(5]. Whenever we have {Lix} ^ D 
then it implies cos"'"(Lia;) < 1 — 2(5^. Using Lemma 12 and knowing that the Li are independent random 
variables we have 


P cos^(Lia;) >m — 2(5^^ < P {{Lix} G D)'^ < 


( 8 ) 


This is the type of probability bound we are looking for, but only for a single x. Next we extend it to all 
X G A simultaneously, where A is the interval of interest of arguments ([^. As an intermediate step, take a 
grid Xj of resolution e = 26^/{mCL log k) on A. Using the union bound for the grid we see 


P ( sup^ cos '*'>m — 26^ j < ^ P ( ^ cos'^(LiXj) >m — 26 

\ ^ i=i / j=i \i=i 

nmC L log k tt 2 / 3 , 

' ' = — log kCL log k log ^ k 


< 


2(52 


= ^ilog 


4 

q \ l0g“ fc 
2/3+a+l ^ [ y 1 


log“ k 


(9) 


4 


In this final term (3/4)'°s“'= is super-polynomial in /c as a > 1. All others have polynomial or lower rate 
in k. Consequently we see that the right hand side of ([^ will become arbitrarily small as k, L grows. In 
particular, it will go below 2/3. 

At this point we have the probability estimate for the grid, we need to extend this to the complete interval 
A, introduced in (|^. We show that for k,L large enough we have 


P j sup ^ cos"''(Ljx) > m — \ < C* I sup''^ cos'^(LiXj) >m — 2(5^ 




( 10 ) 


i=l 


Indeed, for any x G A there is a grid point Xj at most e/2 away. As the derivative of cos’*" stays within 
[—1,1], the change of the sum when moving to Xj from x is at most 




52 


mCL log k 


CLlogk = (5^. 
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Therefore when the sum on the left hand side of (10) is at least m — 5^ for a certain x G A then there 
also must be a grid point xj for which the sum is at least m — 25^. The inclusion of the events shows the 
inequality for the probabilities. □ 

Proof of Theorem^ Choose C > c+1. Let us assume max^ L, < CL log k. This not holding is an exceptional 
event of probability 0{k~‘^) as shown in Lemma In order to exclude the roots from all R^, we split this 
region into three parts. 

When 0 < 0 < 1 is a positive real, it cannot be a solution according to Lemma When z has a small 
argument, that is, | arg( 2 ;)| < 7r/(CLlogfc), we refer to Lemma 10 to confirm z cannot be a root, q{z) k. 

The remaining case is when z has a large argument, that is, Big{z) G A. We aim to bound Iftg(z). On 
one hand, we estimate the magnitude of the terms . Then we combine these with the cosines of the 


arguments to find the contribution to the real part of q{z). Here we rely on Proposition 11 but let us make 
this precise. 

We need to check the magnitude of the terms z~^G Knowing |z| > 1 — l/(Llog"'' k) and Li < CL log fc 
for k, L large enough we have 




1 - 


1 


— CL log k 


L log"'' k 
< exp(2C log^“"^ fc) < 1 + 


< 1 + 


Llog'''fc 


CL log k 


4C 


log 


.7-1 


Considering ^q{z), this gives 




2=1 


I cos{Lix) < ^ l^l cos'^{Lix) < ( 1 H- 

2=1 


4C 


log 


.7-1 


k 

E 

2=1 


cos'''(Lia:). 


( 11 ) 


Let us arrange the k elements of this sum into groups of m = log“ k elements, consequently resulting in k/m 
such groups. Here we omit the minor details dealing with to, k/m not being integers in general. Let the sum 
of these groups be S'!, 52,..., 5'^/^. According to Proposition 
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we have 


P(5, >to-57 < 


and each of these events are independent. Therefore the number of such events happening follows a 
Binom{k/m,r) distribution for some r < 1/3. From standard Chernoff bounds we see that 


/ I k \ k \ ( ( k \\ k\ /S' 

P Binom —, r > - < P Binom —, - > - ] < \ - 

. TO / 2to / V \m 3 J 2m) V 9 


< k- 


for k large enough. Consequently, at most k/(2m) of Si are large (with the exception of an event with 
probability of 0(fc“'^)). In this case we have 


E 

i=l 


COS 


17 jr2 7 

r-r N k k , , o k 

(Lix) = > Sj < -—TO + -—(to — i5^) = k — -—. 
V / 21 ^ 3 — 2yji 9m ^ ’ 9m 


i=i 


Plugging this back into (11) we arrive at 

4C 


^qiz) < 1 + 


log"*' ^ k 






2 to 


1 + 


2 to 


4C 


log’^ ^ k 


1 - 


2 log 


2/3+q 


Let us choose 7 > 2/3 + a + 1. With such a choice, the term k above gets multiplied by a coefficient lower 
than 1 for k large enough. This shows ^q{z) < k which implies z is not a solution of q{z) = k. This a.a.s. 
holds simultaneously for all z G R-y, arg{z) G A. 

Regarding the parameters, previously for Proposition[^we only needed to ensure a, /3 > 1. Therefore we 
can apply Proposition |11| and this reasoning for a = /3 = l + e, 7 = 4 +4e for any e > 0, eventually allowing 
any 7 > 4. During the proof, we had two small exceptional events, both having probability 0(fc“°). This 
confirms the theorem with the condition on 7 and with the probability bound on the exceptional cases. □ 














4 Mixing rates for Bn{k) 

Theorem of the previous section tells about the eigenvalues for the pure drift Markov chain on the graph 
model B{L,k). It guarantees the absence of eigenvalues with large absolute value (except at 1) with high 
probability. Based on this result we get: 

Theorem 13. Assume fc,L —> oo while pi < logL/logfc < /?„ for some eonstants 0 < pi < Pu < oo. Then 
for any 7 > 4 a.a.s. we have the following bound on the mixing rate for B{L, k): 


L log’^ k 

Proof. This lower bound is a direct consequence of Theorem]^ and the definitions of Rry and the mixing rate 
A. □ 

We now translate this result to the other graph model Bn{k), where the total number of nodes are fixed 
beforehand. 


Theorem 14. Assume n, fc —)■ 00 while pi < log n/log A: < for some constants 0 < pi < pu < 00 . Then 
for any 7 > 4 a.a.s. we have the following bound on the mixing rate for Bn{k): 

h 

A > 


n log'’' k' 


Proof. We obtain the model Bn{k) by starting from B{n/k, k) and then conditioning on the event that the 
node count of the graph matches exactly n. Let us call this event M{n,k). We develop the probability of 
this to happen. 

The node count of B(ri/k, k) is calculated by adding up k independent geometric random variables. This 
is closely related to the negative binomial distribution, therefore we can interpret it as follows. We perform 
independent trials of a binary test that has a failure probability k/n. We wait until the fcth failure and 
check whether the number of total trials is n. We arrive in this event if there is a failure exactly at the nth 
trial, and we had fc — 1 failures before. For any given configuration of k positions for failures and (n — k) for 
successful trials, the probability is 


1 -^ 

n 


n—k 


We have to include any configuration of the first k — 1 failures (the last one is fixed). Summing these up we 
get the overall probability 


P{M{n, k)) = 


n — 1 
k-1 


1 -^ 

n 


T—k 


n — 1 \ k^{n — A:)" ^ 
A; — 1 / n" 


( 12 ) 


We develop a simple asymptotic estimate for this probability. From the Stirling formula we know that 


lim 


y/2Trn 


= 1 . 


For conciseness, we will use the « relation if the ratio of the two quantities is 1 in the limit. In this spirit 
we get 


yk) i^)'^ y/2TT{n - k) y 2T:k{n-k)k<^{n-k)'^ 

Let us plug this back to ( |I^ , while noting (^Zj) = 


P{M{n,k)) « 2TTk{n-k) V 27rn(n - A:)' 
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As a very crude bound we get for n, k large enough that 

P{M{n,k))> (13) 

n 

We can now use Theoremj^as the conditions currently imposed imply the conditions of Theoremj^ Therefore 
we get that the probability of having a root in the forbidden ring is bounded as 0{L~^k~’^) = 0(n“^). Also, 
we have just seen in that starting from B{n/k,k), we get a graph with n nodes with probability at least 
1 /n for n, k large enough. 

Consequently, the probability of having a root in the forbidden ring is negligible even compared to the 
event M(n, k). Conditioning on M{n, k) means generating an instance of Bn{k). We have just seen that the 
probability of getting a graph with a root in asymptotically vanishes even compared to this event. This 
completes the proof. □ 


5 Simulations 


Following the asymptotic theoretical results we perform complementing simulations to analyze the tightness 
of the bounds obtained. We also explore further numerically for the next step of research that is not yet 
treated analytically. 

The mixing results we have are exciting as we see a strong speedup compared to the similar reversible 
Markov chain with transition matrix P = {P + P^)/2. By this we set all transition probabilities on all 
edges to be equal in the two directions. For this Markov chain, if the initial distribution is concentrated 
in the middle of the longest arc, the Central Limit Theorem ensures that even after f2(L^log^ k) steps the 
probability of not leaving the arc is bounded away from 0. Consequently we get a lower bound of the same 
order for the mixing time and which in turn translates to the mixing rate bound 


A < 


C 

L2 log2 k ’ 


(14) 


which is a square factor worse than our new results for the non-reversible Markov chain. 

Simulations are in line with the speedup we see when comparing (14) with Theorem 13 Figure is 
a log-log histogram showing the decrease of A as the node count n increases. The histogram presents the 
simulation results for the non-reversible and reversible Markov chain and we do observe the strong separation 
predicted by the theoretical results. The stripe on the top presents A for the non-reversible Markov chains 
while the bottom one corresponds to the reversible ones. Figure is based on 200.000 random Markov 
chains with n ranging from 54 to 2980 and with k = ■ As we are interested in typical behavior of these 

randomized Markov chains, we discarded the top and bottom 5% of the results for every n considered. 

The two type of Markov chains we compared can be seen as the extremal setups: either the asymmetry 
is so strong that steps are deterministic along the cycle, or we have perfect symmetry. There is however 
a full spectrum of intermediate situations, and one may wonder which level of asymmetry is optimal. We 
have seen that full asymmetry is better in terms of mixing performance than full symmetry. On Figure 
starting from the reversible Markov chain, we gradually change the transition probabilities along the cycle 
until we reach the current extreme asymmetric case. Specifically, for 1/2 < q < 1 we set the transition 
matrix Pq = qP -|- (1 — q)P^ and compute the mixing rate of the resulting Markov chain. Here we have 
Pi /2 = P, Pi = P as expected. We perform simulations for il5oo(10) and il5oo(50). In both case, 8000 
random graphs were generated and the mixing rates were computed for all graphs and for q moving along 
[1/2,1]. Again, the top and bottom 5% were discarded. The means of the resulting mixing rates are 
presented in Figure [^ together with the standard deviations. The figures show that the optimal choice is 
near the extremal non-reversible case, confirming our concept. Still, interestingly a minor offset towards the 
reversible version still increases the mixing rate. Intuitively the two effects of the modification match well: 
the small loss in the speed of moving along the cycle is well compensated by the local mixing introduced. 

The analytic treatment of the intermediate Markov chains for q ^ 1/2,1 brings new challenges as the 
non-reversible feature is still present while we lose the deterministic nature of the movement along the cycle. 
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Figure 3: Histograms for the mixing rates A for the Markov chains on the graphs Bn{\^/n\). The upper 
stripe corresponds to non-reversible Markov chains while the lower one to the reversible variants. 


6 Conclusions 


We have seen in Theorem 13 and Theorem 14 that for both models and B{L, k) the mixing rate of the 


non-reversible Markov chain considered is much higher than the similar reversible one. The results confirm 
that the simultaneous application of adding long distance edges and also setting the Markov chain to be 
non-reversible dramatically improves the mixing rate. We believe this phenomenon is promising and could 
provide similar speedup effects for other reference graphs, other methods to add random edges and other 
means of introducing non-reversibility. 

We have also seen numerically in Figure that being fully non-reversible is not necessarily optimal in 
this context, even though it is significantly better than being fully reversible. 

Therefore one of the open questions is to find the optimal Markov chain among the intermediate cases. 
Another goal for future research is to consider the more realistic situation where the hubs do not have such 
high number of connection, for instance, by replacing the complete graph on the selected cnf nodes with a 
random matching on them. Simulations similar to Figure]^ in [6] suggest that a similar speedup is to be 
expected. 
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